S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder

نویسندگان

چکیده

One of the most popular speaker embeddings is x-vectors, which are obtained from an architecture that gradually builds a larger temporal context with layers. In this paper, we propose to derive Transformer’s encoder trained for classification. Self-attention, on built, attends all features over entire utterance and might be more suitable in capturing characteristics utterance. We refer proposed classification model as s-vectors emphasize they heavily relies self-attention. Through experiments, demonstrate perform better than x-vectors. addition s-vectors, also new based verification replacement conventional probabilistic linear discriminant analysis (PLDA). This inspired by next sentence prediction task bidirectional representations Transformers (BERT), feed two utterances verify whether belong same speaker. name Transformer authenticator (TESA). Our experiments show performance TESA PLDA-based verification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Speaker Embeddings for Short-Duration Speaker Verification

The performance of a state-of-the-art speaker verification system is severely degraded when it is presented with trial recordings of short duration. In this work we propose to use deep neural networks to learn short-duration speaker embeddings. We focus on the 5s-5s condition, wherein both sides of a verification trial are 5 seconds long. In our previous work we established that learning a non-...

متن کامل

Robust speaker identification based on selective use of feature vectors

A new method for speaker identification that selectively uses feature vectors for robust decision-making is described. Experimental results, with short speech segments ranging from 0.25 to 2 s, showed that our method consistently outperforms other approaches yielding relative improvements of 20–51% and 15–30% over baseline GMM and the LDA-GMM systems, respectively. 2006 Elsevier B.V. All rights...

متن کامل

task-based language teaching in iran: a mixed study through constructing and validating a new questionnaire based on theoretical, sociocultural, and educational frameworks

جنبه های گوناگونی از زندگی در ایران را از جمله سبک زندگی، علم و امکانات فنی و تکنولوژیکی می توان کم یا بیش وارداتی در نظر گرفت. زبان انگلیسی و روش تدریس آن نیز از این قاعده مثتسنی نیست. با این حال گاهی سوال پیش می آید که آیا یک روش خاص با زیر ساخت های نظری، فرهنگی اجتماعی و آموزشی جامعه ایرانی سازگاری دارد یا خیر. این تحقیق بر اساس روش های ترکیبی انجام شده است.پرسش نامه ای نیز برای زبان آموزان ...

Deep Speaker Vectors for Semi Text-independent Speaker Verification

Recent research shows that deep neural networks (DNNs) can be used to extract deep speaker vectors (d-vectors) that preserve speaker characteristics and can be used in speaker verification. This new method has been tested on text-dependent speaker verification tasks, and improvement was reported when combined with the conventional i-vector method. This paper extends the d-vector approach to sem...

متن کامل

Invited Speaker Abstracts

No abstract is available for this article.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2022

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3134566